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ABSTRACT 

Following  Jureckova  (1981)  we  introduce  a  finite-sample  measure  of  performance  of 
regression  estimators  based  on  tail  behavior.  The  least  squares  estimator  is  studied  in  detail, 
and  we  find  that  it  may  achieve  good  tail  performance  under  strictly  Gaussian  conditions. 
However,  the  tail  performance  of  the  least-squares  estimator  is  found  to  be  extremely  poor  in 
the  case  of  heavy-tailed  error  distributions  or  when  leverage  points  are  present.  Further 
analysis  of  the  least-squares  estimator  with  light-tailed  errors  indicates  the  strong  influence  of 
the  design  matrix  in  determining  tail  performance. 

Turning  to  the  tail  behavior  of  various  robust  estimators  of  the  parameters  of  the  linear 
model,  we  focus  on  tail  performance  under  heavy  (algebraic)  tailed  errors.  The  /^estimator  is 
seen  to  be  a  leading  case:  we  find  a  simple  characterization  of  its  tail  behavior  in  terms  of  the 
design  configuration  and  show  that  a  broad  class  of  M-estimators  have  the  same  performance. 

Perhaps  most  significantly,  it  is  shown  that  our  finite-sample  measure  of  tail  performance 
is,  for  heavy  tailed  error  distributions,  essentially  the  same  as  the  finite  sample  concept  of 
breakdown  point  introduced  by  Donoho  and  Huber(1983).  This  finding  provides  an  impor- 
tant probabilistic  interpretation  of  the  breakdown  point  and  clarifies  the  role  of  tail  behavior 
as  a  quantitative  measure  of  robustness.  This  link  is  further  explored  for  high-breakdown 
regression  estimators  including  Rousseeuw's  (1982)  least-median-of-squares  estimator. 


*  This  research  was  initiated  while  Jureckova  was  Miller  Visiting  Professor  at  the 
University  of  Illinois.  Support  of  NSF  Grants  SES-8707169  and  DMS  88-02555  is  also  grate- 
fully acknowledged. 


1.  Introduction 

Several  authors,  including  Bahadur(1967)  and  Sievers(1978),  have  studied  measures  of 
performance  for  estimators  Tn  =  Tn(Xx,...,Xn)  of  a  location  parameter,  9,  based  on  the  tail  pro- 
bability /V  |  Tn-9 1  >a )  for  a  fixed  a  as  «-+oo.  In  the  location  model  with  independent  observa- 
tions Xx, . . .  ,Xn  from  an  absolutely  continuous,  symmetric  distribution  function  F(x-9)  with 
density  f(z)  =  f(-z)  >  0,  zeR1,  Jureckova  (1981)  considered  the  measure  of  performance 

B(atT.)--l0g™T^>a)  (l.D 

k    '   n)  -log(l-F(a)) 

for  fixed  n  as  a-*oo,  and  showed  that  for  any  [reasonable]  translation  equivariant  Tn, 

1  <  liminf  B(a,Tn)  =  B  <  limsup  B(a,Tn)  =  B  <  n.  (1.2) 


Further,  it  was  shown  that  both  bounds  can  be  attained  by  the  sample  mean  Tn  =  Xn.  For  F's 
with  exponential  tails,  Xn  attains  optimal  tail  performance  with  the  log  of  the  probability  of 
{\Tn-9\>a)  tending  to  zero  n  times  faster  than  the  log  of  the  probability  that  a  single  observa- 
tion from  F  exceeds  a.  While  for  F's  with  algebraic  tails,  the  probability  that  \Xn-9\>a 
tends  to  zero  no  faster  than  the  probability  of  a  single  observation  from  F  exceeding  a.  This 
striking  lack  of  robustness  of  Xn,  i.e.  the  sensitivity  of  its  tail  performance  to  the  tail  behavior 
of  F,  leads  to  the  question:  Are  there  other  estimators  with  good  tail  behavior  over  a  broad 
class  of  possible  F's? 

For  L-estimators  of  location  of  the  form, 


n 


Tn  =  SC»-^(i) 


t=l 


where  A,(1),...,A,(n)  are  the  order  statistics  of  the  random  sample,  Xlt  ■  ■  ■  ,Xn;  c,>0,  /  =  1 n\ 

£c,  =  l;  and  c,=cn_t>1=0  for  i  =0,\,...,k,0<k<  n /2.  Jureckova  (1979)  proved  that 

A:  +l<liminf  5(fl,F„)<limsup  B(a,Tn)<n-k.  (13) 

a—oo  a— »oo 

Thus   for  the  sample   median   with  n    odd,    \imB(a,Tn)  =  (n  +  l)/2.     Furthermore,   for  any 


Huber-type  M-estimator,  Tn,  (defined  as  a  solution  to  the  equation  J^i>{Xi-t)  for  a  nonde- 

«=i 

creasing,  odd  function  rp  such  that  ip(x)=ip(k)  for  x>A:,  for  some  k>0),  has  the  same  tail  per- 
formance as  the  median  for  F's  with  either  exponential  or  algebraic  tails.  Note  that  this  holds 
for  Huber  estimators  with  fixed  scale--a  sufficiently  poor  estimator  of  scale  could  wreck  havoc 
here. 

The  foregoing  results  seem  to  suggest  a  relationship  between  tail  performance  of  loca- 
tion estimators  and  the  finite-sample  version  of  breakdown  point  of  estimators  introduced  by 
Donoho  and  Huber(1983).  The  latter  concept,  originally  suggested  by  Hampel  (1968),  has 
played  a  central  role  in  recent  work  on  robust  estimation  and  testing  since  it  provides  an 
appealing  yet  tractable  quantitative  assessment  of  robustness.   The  finite-sample  replacement 

version  of  the  breakdown  point  en*  of  an  estimator  Tn  is  the  following.   Let  x°  =  (xx xn) 

denote  an  initial  sample,  and  let  xm  be  a  "contaminated"  sample  constructed  by  replacing  m 
arbitrary  elements  of  x°  with  arbitrary  values.  The  breakdown  point  of  Tn  at  the  sample  x°  is 
then,  en*  =  m* In,  where  m*  is  the  least  integer  m  such  that  sup\\Tn(xm)-Tn(x°)\\  =  oo,  i.e.  the 

xm 

smallest  number  of  observations  which,  if  replaced  by  arbitrary  values,  could  drive  Tn  beyond 
all  bounds.  The  following  result  clarifies  the  relationship  between  tail  performance  and  break- 
down point  for  a  large  class  of  location  estimators. 

THEOREM  1.1.  Suppose  Tn(Xy Xn)  is  a  location  equivariant  estimator  of  9  such  that 

Tn  is  nondecreasing  in  each  argument  X{ .  Then  for  any  symmetric,  absolutely  continuous  F ,  with 
density  f  (z )  =  /  (-z )  >  0,  z  eR1. 

m*  <  liminf  B(a,Tn)  <  limsup  B(a,Tn)  <  n-m*  +  l 

a—<x>  a— »oo 

The  key  point  of  the  proof  is  the  following: 

LEMMA  1.1.   Under  the  conditions  of  Theorem  1.1,  there  exists  a  constant  A  such  that 


x{m')-A  <Tn<  x(n_fn«+1)+^ 

Proof.  By  equivariance, 

Tn(x1 ,xj  -  Tn(x (1)-x {m) x{m_irx{m),0,+,+ +)  +  x{m) 

>  * (m)+Tn(x (X)-x (m) X(m_1)-x(m),0,0,0 0) 

where  m=m*  and  +  denotes  any  positive  number.  By  the  definition  of  m\  \Tn\  with  only 
m'-l  possible  outliers  is  bounded,  say  by  A.  Hence,  Tn>x^-A.  The  other  inequality  may 
be  established  similarly.  ■ 

Proof,  (of  Theorem  1.1.)  From  the  lemma, 

Pe(Tn-6>a)  =  P0(Tn>a) 

>  Po(x{m)>a+A) 

>  (l-F(a+A))n^n~1. 

Thus, 

-log2P0(Tn>a)  log2+log(l-F(fl-K4)) 

"Ka'ln)       -log(\-F(a))    -l"    m  +  l)         log  (\-F(a)) 

Letting  a— >oo  we  get  limsup  B(a,Tn)<n-m  +1.    On  the  other  hand  liminf  B(a,Tn)>m  fol- 

a— «oo  a— »oo 

lows  from, 


P0(7'n>«)<PoUfl-m+1>fl+/l)  = 


(\-F(a+A))T 


Remarks:  Note  that  Theorem  2.1  holds  for  both  exponential  and  algebraic  tails  of  F . 
The  conditions  on  Tn  are  satisfied  for  M-estimators  with  monotone  V,  L-cstimators  with  posi- 
tive weight  functions,  but  not,  for  example  for  redescending  M-estimators  or  the  least  median 
of  squares.  If  Tn  has  high  breakdown,  like  the  sample  median,  with  e„-*lh,  then 
limsup  B(a,Tn)  <  n /2  for  either  error  distribution  type.    This  seems  to  suggest  that  highly 


robust  methods  necessarily  sacrifice  performance  in  light  tailed  circumstances.  Finally,  the 
high  breakdown  estimators  are  seen  to  satisfy  a  minimax  property:  they  maximize  least  favor- 
able tail  performance  over  the  two  distribution  types  described  in  Jureckovd  (1981)  and 
defined  in  the  next  section. 

The  intimate  connection  between  finite-sample  breakdown  and  finite-sample  tail  perfor- 
mance is  developed  further  in  subsequent  sections.  Section  2  introduces  a  measure  of  tail  per- 
formance for  regression  estimators  and  discusses  some  possible  alternative  measures.  Section 
3  contains  a  detailed  analysis  of  the  tail  performance  of  the  least  squares  estimator.  Section  4 
investigates  the  tail  behavior  of  various  robust  estimators  of  the  parameters  of  the  linear 
model.  Some  concluding  remarks  on  the  relationship  between  breakdown  and  tail  performance 
appear  in  the  final  section. 

2.   A  Measure  of  Tail  Performance  for  Regression  Estimators 

We  now  turn  to  the  linear  model, 

Y  =  X0  +  e  (2.1) 

where  Y  is  an  n  vector  of  random  responses,  X  is  a  fixed  (n  xp)  design  matrix  of  rank  p,  with 
rows  Xi'.i  =  l,...,n,  f3  is  an  unknown  p -dimensional  parameter  vector  and  e  is  an  n -vector  of 
independent  errors,  with  common  absolutely  continuous  distribution  function  F.  Assume 
throughout  that 

0<F(z)<l,     F(z)+F{-z)=  \,      zeR1.  (2.2) 

We  wish  to  estimate  the  parameter  fi  without  precise  knowledge  of  the  shape  of  F.  Consider 
estimators  Tn  of  0  satisfying  the  affine  (or  regression)  equivariance  condition, 

CONDITION  A.   T^Y^x^,--- ,Yn+xnb)  =  Tn(Y1 Yn)+b,    beRp. 

To  extend  the  measure  of  tail  performance  (1.1)  used  for  scalar  location  estimators  to  the 
p-dimensional  regression  context  we  propose 


D,    „v      -log  P^max,-  \Xj{Tn-p) \ >a ) 
-log(l-F(fl)) 

Obviously,  we  would  like  the  probability  in  the  numerator  (of  a  discrepancy  of  a  between 
some  Yi  and  EY{)  to  tend  to  zero  as  quickly  as  possible  as  a—>oo.  But,  as  in  the  case  of  loca- 
tion estimators,  this  rate  is  inherently  controlled  by  the  tail  behavior  of  the  error  observations. 
Thus  we  should  expect  from  good  estimators  that  B(a,Tn)  is  reasonable  high,  but  since  we  can 
not  hope  to  control  B(a,Tn)  for  all  a,  we  study  the  tail  behavior  of  Tn,  i.e.  the  limiting 
behavior  of  B(a,Tn)  as  a— »oo. 

It  is  crucial  in  the  sequel  to  distinguish  between  two  broad  classes  of  tail  behavior  for 
the  underlying  error  distribution  F.  Following  Jureckovd  (1981),  a  distribution  F  will  be 
called  Type  I  (exponentially  tailed)  if 

lim-los(l-F(a))=1 
baT 


for  some  b>0,  and  r>0;  and  it  will  be  called  Type  II  (algebraically  tailed)  if 

to  -log  d-f  (a))., 

o-k»       m\o%  a 

for  some  m  >0. 

The  tail  performance  bound  (1.2)  extends  to  the  regression  model  if  we  impose  a  mild 
further  condition  on  Tn  that  there  exists  at  least  one  nonnegative  and  one  nonpositive  residual 
r,  =  y,  -  x,r„,  i  =  1,  •  •  • ,  n.  Under  this  condition,  y(1)  >  a  implies  y(n)  >  a  and  v(fl)  <  -a 
implies  v(n)  <  -a,  thus,  using  the  symmetry  of  F , 

Po{\9\(n)>a)>2P0(y(l]>a) 
«2(1-F(a))» 

and  hence 

limsup  B {a,  Tn)  <  n  . 

a  — ►  oo 

However,  as  subsequent  results  will  illustrate,  achieving  this  upper  bound  may  be  limited  to 


the  case  of  the  sample  mean  in  the  location  model  with  exponential  tails.  A  lower  bound  of  1 
may  be  derived  for  errors  of  Type  II,  under  somewhat  more  stringent  conditions,  but  for  F  of 
Type  I,  it  is  easy  to  construct  examples  for  which  limsup  B (a,  Tn)  is  less  than  1.  In  fact,  con- 
sider a  simple  linear  regression  with  n=3,  xx=0,  x2=\,  and  x3=2,  and  let  Tn  be  determined  by 
the  first  two  observations.    Then,  if  \yl\<e  and    \y2\  >  b   it  follows  that  max  |  x,- rn  |    = 

i 

x3\Tn\  >  2b  It.  Hence,  P(max|,x,rj  >a)>  P{\yi\  <  e)  P(\y2\  >ae/2)>  cP(y2>ae/2). 
So,  for  Type  I  distributions,  B  <  (e/2)r. 

The  measure  of  performance  (2.3)  is  only  one  of  many  possible  criteria  of  tail  perfor- 
mance for  the  p-dimensional  regression  estimator,  and  one  is  naturally  led  to  ask  whether 
results  are  sensitive  to  the  specific  form  of  the  criteria.  This  question  is  partially  answered  by 
the  following: 

THEOREM  2.1  Let  T  be  the  class  of  all  7  =  7x  :  Rp  — ►  R  which  are  strictly  positive  (for 
non-zero  arguments),  continuous,  and  linearly  homogeneous  (i.e.,  i(cb)  =  ci(b)  for  c>0  )  .  For 
any  7  £  T  ,  define  the  tail  criterion 

bit       ^        -\QgP{i{Tn-P)>a  ) 

B^(Tn,a)=    ■ — — (2.4) 

-log  (1  -  F(a)) 

and  define   B  =  liminf  B(a,Tn)   and   B  =  limsup  B(a,Tn)  ■   Then  for  any  7i  €  T   and  72  e  F  • 


there  is  a  constant,   c  ,  such  that  for  all  b,  ii(b)  <  cy2(b)  ■   As  a  consequence,  for  any  Type  II 
distribution. 


Proof.  For  7  G  T  ,  define 


0(7)  =  sup  7(6)   <   +00 


0(7)  =  inf  7(6)  >  0 

IIMM 


Then,  for  any  7X  and  72  , 


1i(b )  <  c"(7i)  \\b  ||  <  -^T  Ta(* )  »  C *  i2{b ) 
£(72) 


where  c    =  c  (ii)/c(n2) ;  and,  hence, 

P{li(Tm-fi)>a  }  <P{l2(Tn-(3)>c'a  } 
As  a  consequence,  letting  a  *  =  c'a  , 

'-lOg /»{£?,-  >fl*} 


B^(Tn,a)>B~(Tn,a  ) 


'-^v.-  in  **  /  —    "72 


-logP{e,  >a} 


(2.5) 


The  result  follows  since  the  roles  of  7X  and  72  can  be  interchanged  and  since  the  limit  of  the 
ratio  of  logs  of  probabilities  in  (2.5)  tends  to  one  for  type  II  distributions.  ■ 

Remark.  For  type  I  distributions,  the  ratio  of  logarithms  of  probabilities  in  (2.5)  tends  to 
(c  ')T  ,  and  so  tail  behavior  depends  on  the  choice  of  7  . 

For  7  defined  in  terms  of  residuals,  it  is  possible  to  define  a  corresponding  M-estimator 
whose  tail  behavior  is  determined  by  one-dimensional  tail  behavior.  Using  such  an  estimator, 
one  can  obtain  tail  behavior  with  B  >  [(n-p +  l)/2]  for  type  II  distributions.  Such  robustness 
results  will  discussed  further  in  Section  4. 

LEMMA  2.1.  Consider  the  criteria  7  defined  by  ix(b)  =  i*(\Xb\)  = 
1*(\x  ib  \,- '-  j\xnb  \)  where  7*:/?+  — >  R+-  Let  7*    and  7  be  two  such  functions  satisfying 

(0  7*(f)  is  non-decreasing  in  each  coordinate  of  r  , 

(ii)  7*(r  +  s)  <  c  (  7(r )  +  7(5 )  )  for  some  constant  c  =  c (X) , 

(Hi)  There  exists  an  estimator  Tn '   minimizing  t"(  |  Xb  —  Y  |  )    over  b  . 

Then,  for  any  type  II  distribution, 

P{l\\X(Tn'  -0\)>a  }</>{7(kl)>  |r}    • 

Proof:  Using  the  triangle  inequality  and  the  conditions  above, 


P{7'(|  X(Tn'  -j3)\)>a)<P{1\\  XTn*  -Y)\  +  |  Xfi  -  Y |)  >  a) 

<P{ci(\  AT/  -7)|  +cTf(l  A-/9  -  r  | )   >«} 

</>{2cr(i  ^-ni)>fl> 

=  />{7(kl)>  £}  .     ■ 

This    result    can    be    used    to    show    that    there    exists    an    estimator    achieving 
B  >  [(n-p  +  l)/2]   in  type  II  cases.    This  bound  may  be  the  best  possible  in  general,  and  it 
corresponds  exactly  to  the  best  breakdown  bound  possible  for  affine  invariant  estimators  (see 
Rousseeuw  and  LeRoy  (1987,  p.  125)). 

COROLLARY.  Let   7*(r)  =  r(p+1)  {the    (p+1)*  smallest     r,),  and  with   k=[(n-p  +  l)/2] 
let  7  =  /■(„_* +xj   (so  that   Tn*    minimizes  the  corresponding  k01  largest  absolute  residual).   Tlien 
B(Tn')>k    . 

Proof.  Condition  (i)  is  clear.  For  condition  (ii),  note  that  if  (r  +  s)(p+1)  =  a  then 
r,  >  a /2  or  s,  >  a /2  for  at  least  (n-p)  indices.  Whence,  either  /-,  >  a /2  for  k  indices  or 
Si  >  a/2  for  k  indices.  Therefore, 

(r  +  s)(p+i)  <  2(r(n_Jk+1)  +  5(n_jk+1))  . 

Condition  (iii)  follows  from  general  results  on  S-estimators,  so  the  Lemma  holds.    Further- 
more, for  some  constant  c  , 

P{  klM+i)>fl  }<c  (P{e{  >a))k     , 
and  the  result  follows  from  (2.4)  and  Theorem  2.1.      ■ 

3.  Tail  Behavior  of  the  Least  Squares  Estimator 

The  tail  performance  of  the  classical  least-squares  estimator  suffers  from  the  same  sensi- 
tivity to  the  tail  behavior  of  F  as  its  counterpart  from  the  location  model  the  sample  mean. 


10 

THEOREM  3.1  Let  Tn  be  the  least  squares  estimator  of  0  in  the  model  (2.1)  with  F  satis- 
fying condition  (2.2).  Let  h  =  max, 7^-  =  max,-  Xi(X'X)~*Xi '. 

(i)  If  F  is  of  Type  I  with  l<r<2,  then 

/T1-r  <  liminf  B{a,Tn)  <  limsup  B{a,Tn)  <  h~*. 

a— co  a— >oo 

(it)  If  F  is  of  Type  I  with  r  =  \,  then 

/T"1/2  <  liminf  B (a,Tn)  <  limsup  B(a,Tn)  <  /T"1. 


(Hi.)  If  F  is  normal,  then 

limfl (a  ,rj  -*■*. 

a— *oo 

(iv.)  If  F  is  of  type  II,  then 

lim   B(a,Tn)  =  1. 

a  — » oo 

Proof.    Let  H  =  X(X'X)~lX'  be  the  projection  (hat)  matrix  corresponding  to  X,  and 
suppose  that  h  =  hn.  Recall  that  0<h  <  1  and  Y{  =  x{Tn  =  ht  'Y,  so  we  may  write, 

iVmax,|x,(7;-/?)|>a)  =  PoCmax.l^'n^) 

>  P0(hl'Y>a)  >  PQ(hY1>a,hi2Y2>0,  ■  ■  ■  ,hlnYn>0) 
>  PoiY^a/hXW*  =  (\-F{a/h~Wk)n-1. 


Hence, 


limsup  B(a,Tn)  <  limsup  'l°^\'F^  ■  (3-D 

o-KX3  o-k»       -log  (\-F(a)) 

for  F  of  Type  I  this  further  implies, 

limsup  B(a,Tn)  <  limsup  ^SsJhl   =  ft-^  (3-2) 

o— ^»  a— »oo         haT 

which  gives  the  upper  bounds  in  (i)  and  (ii),  respectively.   For  F  of  Type  II,  (3.1)  implies. 
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limsup  B{aJn)  <  limsup  mXoZ(alh)  =  j.  (3.3) 

m  log  a 


And,  since  the  least-squares  estimator  has  at  least  one  positive  and  one  negative  residual,  (iv) 
follows. 

On  the  other  hand,  if  F  is  of  type  I  with  l<r<2,  then,  by  Markov's  inequality  for  any 

€€(0,1), 

/ymax,|.xt(rn-/3)|>a )  =  P0(max< :|  Y, \  >a ) 
<  £0exp{(l-e)M1-r(max,|i>,|)r}exp{-(l-e)M1^ar},  (3.4) 


so  if, 


0  <£0exp{(l-e)Mw(max,|?f  |)r}<  Ce  <  00  (3.5) 


then 


-log  /ymax,  I  Yi  | >a )  >  -log  Ce  +  ( 1  -t)bhl^a T 
and  the  lower  bound  in  (i)  follows.  To  prove  (3.5)  we  may  write 

{maXi\tt\Y  =  max,- 1 ht Y\'  <  max,(||*,.yr||r)' 

<  max,.(E^)r/*sm  r  <  ftr_1sin  r 

where  5  =  r/(r-l)>2.  Hence, 

£0exp{(  1  -e)bh^ (max,- 1 £  | )'}  <  £0exp{(  1  -e)6  £  |  F*  | ' } 

<(E0exp{(l-e)b\Yln)n.  (3.6) 

Using  an  integration  by  parts, 


0  <  £0exp{(I-0*  l^iT)  =  -2jexp{(l-e)by')d(\-F(y)) 

0 


12 


K 


<  2fexp{(l-e)byr)dF(y)  +  2exp{(l -e)bK')d-F(K)) 


00 

+  2fr(\-e)byr-1(l-F(y))exp{(l-e)by')dy 

K 


K 

<  2jexp{(l-e)byr)dF(y)  +  2(l-F(^))exp{(l-€)^2} 


oo 

+  2fr(\-e)byr-1exp{-(e/2)byr)dy  <Ct<oo. 

K 

for  K  such  that  l-F(y)  <  exp(-(l-e/2)6yr)  for  y  >  K.   This  gives  the  lower  bound  in  (i)  and 
(iii),  respectively.  Analogously,  if  r  =  l,  then, 

P0(max,- 1  Y% ;  | >a )  <  E0exp{(  1  -e^/T^^max,- 1  tt  \ }exp{-(  1  -e)bh~1/2a } 

and 

max,-|£|  =maxf.|^r|  ^"^Sl^l- 

A 

which  gives  the  lower  bound  in  (ii).    Finally,  if  F  is  normal  jV(0,<r*)  then  Y-X/3  has  an  n- 
dimensional  normal  distribution  NiO^H).  Hence, 

P(max,- 1  Yi  |  >  a )  >  P0(h1Y>a )  =  l-^(aa-lh~1/2) 

and  this  give  the  upper  bound  in  (iii).  ■ 

Remarks.  In  the  one-sample  location  model  h  =  \/n  and  thus  (iii)  specializes  to 
Theorem  2.2  of  Jureckova  (1981)  but  in  the  linear  model  with  leverage  points  h  may  be  near 
one  and  consequently  tail  performance  would  be  very  poor.  On  the  other  hand,  for  F  of  Type 
II,  the  tail  behavior  of  the  LSE  is  always  extremely  poor.  In  effect,  its  tail  performance  with  // 
observations  is  no  better  than  that  with  a  single  observation,  that  is,  here  the  tail  performance 
is  the  same  as  the  breakdown.  In  the  balanced  p -sample  problem,  h  =  p In,  while  for  stochas- 
tic designs  h  =  Op(p/n)  under  some  regularity  conditions,  so  the  tail-performance  bound  n/p 
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in  the  Gaussian  case  is  suggested.  In  heavy  tailed  cases  much  better  performance  is  possible 
with  robust  methods  of  estimation  to  which  we  now  turn  our  attention. 

4.  Tail  Performance  of  Robust  Estimators  of  the  Linear  Model 

Given  the  poor  tail  performance  of  the  least-squares  estimator  under  heavy  tailed  (Type 
II)  conditions,  it  is  useful  to  know  whether  better  performance  is  possible  from  various  robust 
estimators.  Since  any  amount  of  Type  II  contamination  of  a  Type  I  distribution  will  yield  a 
Type  II  distribution,  (Jureckova  (1981)),  the  tail  performance  of  regression  estimators  under 
Type  II  conditions  appears  to  be  an  useful  quantitative  assessment  of  robustness. 

An  upper  bound  on  the  tail  performance  of  L-estimators  of  the  regression  parameter 
based  on  regression  quantiles  (Koenker  and  Bassett(1978)  and  Koenker  and  Portnoy(1987))  is 
based  on  the  following  result. 

LEMMA  4.1  Suppose  in  the  linear  model  (2.1),  the  design  matrix,  X,  contains  an  intercept, 
i.e.  Xb  =  ln  for  some  b,  and  the  error  distribution  F  satisfies  (2.2),  then  for  any  0th  regression 
quantile,  B  <  [min{9,l-6)n]  +  1. 

Proof.  If  max,  |  y,  |  <  \y  \(k-i),  then  there  exist  at  least  k  strictly  positive  or  k  strictly 
negative  residuals,  but  by  Theorem  3.4  of  Koenker  and  Bassett(1978),  k  <  min{0,l-d}n.  The 
result  then  follows  for  either  type  of  error  distribution.  ■ 

THEOREM  4.1.  Let  Tn  be  an  estimator  of  the  form. 

ai 
Tn  =  Jj(t)kt)dt 

A 

where  0(t )  is  the  regression  quantile  process,  and  J  is  a  non-negative  function  which  integrates  to 
one  on  [ao,a1]c(0,l).  //  F  is  either  Type  I  or  Type  II,  then 
limsup  B{a,Tn)  <  [rm^a^X-a^n]  +  1. 

a— »oo 

Proof.  By  the  convexity  of  the  measure  of  performance  7(6)  =  max,|,x,6  |  we  have 
P{l(Tn)>a)  <  P(max}i(/3j)>a)  where  the  max  is  over  the  distinct  regression  quantiles  on  the 
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interval  [a0»<*i]-  Tne  result  then  follows  from  the  preceeding  lemma.  ■ 

Thus  we  find  that  the  tail  performance  of  the  lx  estimator  may  be  as  good  as  (n  +l)/2,  as 
for  example  in  the  case  of  sample  median.  However,  the  lower  bound  on  the  tail  performance 
of  the  /x  estimator  is  much  more  informative.  To  pursue  this,  we  define  m*  to  be  the  largest 
integer  m  such  that  for  any  subset  M  of  TV  =  {1,2,...,a?  }  of  size  m , 

E    1**1 

inf  flgM    -      >  % 

IMI=i    El**  I 
i&t 

(N\M  will  denote  the  complement  of  M  with  respect  to  N.)  In  the  special  case  of  scalar 
regression  through  the  origin  m*  +  l  is  the  smallest  integer  k  such  that  for  some  subset  K,  of 
size  k,  J]  |x,- 1  >  %£  |jc,-|.  When  the  x.'s  are  equally  spaced  on  [0,1],  for  example,  this  m*/n 

tends  to  l-l/>/2  =  .29289  as  n-+oo. 

LEMMA  4.2  For  the  lx  estimator,  B  >  m*  +  l  for  F  of  Type  II. 

Proof  It  suffices  to  show  that  if  all  but  m  <  m*  of  the  y's  are  bounded  by  1,  then  the 
/i-estimator  will  be  uniformly  bounded.  By  the  triangle  inequality, 

El*-*.-*  I  >  E  1**1  -  E  In  I  -(El**  I  -Ely.- 1) 

N  N\M  N\M  M  M 


=  E  1**1  -El** I  +£I*I-2E  l.v,l. 

N\M  M  N  N\M 

By    the    definition    of    w«,    there    exists    c>%     such    that     E  l**l^cEI**l     and 
El**  I  ^(1_C)EI**I  for  all  subsets  A/ of  size  m<m*.  Therefore, 

M  N 

Ely.-** I  -Ely. I  >(2c-i)EI**l  -2E  l.v.1 

N  N  N  N\M 

So    if    |y,|  <  1    for    ieN\M,    there    exists    a    constant    C    such    that    if   ||6||>C    then 

Ely.-** I  -Ely. I  >  o. 

N  N 
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The  proof  may  be  completed  by  noting  that  the  lx  estimator  is  scale-equivariant  so 
max,  \xtTn  |  <  C\y  I („_,„.)  where  \y  !(»_„».)  is  the  m.-th  largest  absolute  y.  ■ 

Lemma  4.2  may  be  extended  to  a  broad  class  oC'ly-type"  estimators  characterized  by  p 
functions  such  that  for  some  K>0  and  all  u, 

\P(u)-  \u\    \<K    .  (4.1) 

In  fact,  all  such  li-type    estimators  have  exactly  the  same  tail  behavior  for  a  wide  class  of  tail 
criteria  and  for  both  type  I  and  type  II  distributions. 


LEMMA  4.3.  Let  /?  be  an  M-estimator  minimizing   XM^*-*^) ,  where  p  satisfies  (4.1). 
Let  /?!  denote  the  lk  estimator.  Then  there  is  a  constant  c  such  that 

W-PA\<  ^r-  •  (4.2) 


Proof.  Without  loss  of  generality  assume  that   /?i    is  unique;  for  otherwise  the  vector  y 

A 

could  be  perturbed  by  a  very  small  amount  so  that  /?x   is  unique  and  the  residuals  (and  hence 
the  objective  function  for  p  also)  changes  by  a  bounded  amount. 

Now,  consider  the  objective  function  for  the  /x  estimator  and  let  h  =  (  /':  ;c,-/?i  =y,-  } . 
From  Kocnker-Bassett  (1978),  the  directional  derivative  of  the  objective  function  at  3i  m 
direction  b  (with   ||6||=1  )  is 

-E  s8nCv.-  -Xifoxfb  +  £|x,6|    >  0      for  all  b    . 

For  each  b  of  norm  1,  define  the  finite  set  E{b)  and  the  function  g(b)  as  follows: 
£(£)  =  {€,  =±1,  i&'.-EiiXtb  +  £1***1  >0} 


g(b)  =   min    {-£«****  +E\Xib\  } 
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It  follows  that  for  each  b,  there  is  a  neighborhood  about  b  on  which  g  is  continuous  and 
g(b)>0  .  Let  c  =  inf{  g(b):  ||6||  =  1  } .  Then  c  is  a  positive  constant  (depending  only  on  the 
design  matrix).  Therefore,  using  (4.1),  for  ||£  -px  ||  >  2nK/c  , 


E  p(yi-XiP)>  E\yt-xtfi\  -mc 

1=1  «=1 


>  E  |y.-  -XiA\  +  -—  c  -  «a: 

i=i  c 


>  E  p(y*  -  xM  • 

i=i 

A  A  A 

Hence,  (since  p  minimizes  its  objective  function,  ||  p  -  px  ||  <  2nK  /c   .      ■ 

THEOREM  4.2.  Following  Theorem  2.1,  let  7  €  T   6e   a  rtorm  50  //7a/  in  addition  to  posi- 
tive, linear  homogeneity, 

l(a  +  b)<  (n(a )  +  i(b ))  for  all  a ,b  e  R"   .  (4.3) 

A  A  —       A  —       A 

Then    B^P)  =  51(/51)    and    Bn(P)  =  57(/31) ,  /or  a«y  M-estimator  whose    p- function    satisfies 

A 

(4.1 )  for  both  type  I  and  type  II  distributions.  In  particular,  B(P)  >  m*  +  l  for  F  of  Type  II. 
Proof.  From  (4.3) 

-  liP  -  Pi)  <  iCP  ~P)~  7(&  ~0)<  iCP  "  Pi)     ■ 
It  follows  from  the  proof  of  Theorem  2.1  that 

\1Cp-P)-iCpl-P)\  <c\$4a<c*. 


Therefore,      the      tail      behavior       B(P,a)       differs      from       Bifida)       by      a      factor 
-log  P{e>a-c')/-\og  P{e>a)  ,  which  tends  to  1  (as   a— »oo  )  for  both  type  I  and  type  II  dis- 
tributions.       ■ 

Remark:  Note  that  i(b)  =  max,  |,x,6  |    satisfies  the  hypotheses  of  theorem  4.2. 
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The  following  result  establishes  that  this  lower  bound  on  the  tail  performance  of /x-type 
estimators  is  essentially  the  same  as  their  (Donoho-Huber)  finite-sample  breakdown  point. 

THEOREM  4.3  For  l±  type  estimators,  m*  +  \  <  m*  <  m<+2. 

Proof.  In  fact,  it  has  already  been  shown  above  that  if  there  are  only  m*  outliers,  ly  type 
estimators  will  not  break  down.  Thus,  m*  >  m*  +  l. 

By  the  definition  of  m.,  there  exists  a  subset  M  of  size  w»  +  l  and  a  vector  ||fc0||  =  1  such 
that     £  \x{bQ\  <£|x,-60l-     Thus,    for    m  =  w«+2    there    exists    a    subset    M    such    that 

£  l*i*ol  <  £l*i-*ol    (with    strict    inequality).     Let    r)(b)  =  £|x,6  |  -  £  \xtb  |.     Clearly, 

N\M  M  M  N\M 

r}{b0)  >  0,  and  rj(bc)  -  rj(cb)  =  c  17(b).  Suppose  yt  =  0  for  ieN\M  and  y,  =  cx,Z>0  for  ieM. 
Then, 

Sly.-^el   =    £   \Xib9\   =E\x>bc\-Cr,(b0)  =  Yl\yi\-Cr,(b0). 
N\M  M 

On  the  other  hand,  for  any  bounded  /?, 

E  \yt-xt0\  >  E I*  I  -  £l*,-0l  >  E  \yi\-n{maXi\XiW\\. 

N 

With  sufficiently  large  c,  we  have  £  |y,-x,A  I  <  E  \y%~xiP\  f°r  a^  bounded  0.  This  implies 
there  is  a  breakdown  with  m*+2  outliers.  The  extension  to  /x-type  estimators  follows  from 
Lemma  4.3.  ■ 

Remark.  This  result  strengthens  the  close  link  between  tail  performance  of  estimators 
under  Type  II  errors  and  their  finite-sample  breakdown  point.  This  connection  is  further 
developed  below  in  the  discussion  of  high-breakdown  regression  estimators. 

There  seems  to  be  a  common  misapprehension  about  the  /x  estimator  that  for  fixed  xi}- 
the  /x  method  is  very  robust  while  for  random  xi3  it  is  more  fragile.  Indeed,  Donoho  and 
Huber(1983)  remark  that  the  breakdown  point  of  the  /x  estimator  is  xh  when  there  is  "corrup- 
tion only  in  y".  However,  as  the  preceeding  result  illustrates,  the  breakdown  point,  and  there- 
fore the  tail-performance,  of  the  /x  estimator  can  be  quite  poor  even  for  "fixed"  designs.   Even 


18 

in  the  relatively  favorable  case  of  uniformly  spaced  x  's  the  breakdown  point  is  roughly  .3  in 
the  through-the-origin  model. 

While  /i-type  estimators  can  have  rather  poor  tail  performance,  and  consequently  low 
breakdown  points,  we  shall  show  that  the  high  breakdown  estimators  for  regression  recently 
introduced  by  Rousseeuw(1984)  and  Rousseeuw  and  Yohai(1984)  possess  good  tail  perfor- 
mance as  measured  by  (2.3)  when  the  error  distribution  is  of  Type  II.  We  will  focus  on  the 
least  median  of  squares  (LMS)  estimator  proposed  by  Rousseeuw(1984)  which  solves 

Min  median{{yl  -  x,6)2,  •  •  •  ,(vn  -  xnb)2). 

The  breakdown  point  analysis  of  the  LMS  estimator  is  carefully  carried  out  in  Rousseeuw  and 
Leroy  (1987).  Following  their  terminology  we  shall  say  that  the  observations  {(y,,;c,)  1=1/0 
are  in  general  position  if  any  p  of  them  give  a  unique  determination  of  b  {h )  =  X(h  )~ly  (h ). 
Some  slight  modifications  of  their  argument  for  contamination  of  both  v  and  X  leads  to  the 
following  result  for  fixed  design  points. 

LEMMA  4.3  For  the  LMS  estimator  with  observations  in  general  position  m*  =  [n  /2]-p  +2. 

The  lower  bound  on  the  tail  performance  of  the  LMS  estimator  without  restriction  to 
observations  in  general  position  is  closely  related. 

LEMMA  4.4  For  F  of  Type  II,  the  LMS  estimator  has  B  >  [n/2]-k  +  l,  where  k  is  the 
smallest  integer  such  that  inf  \x:b  \lk\  >  0. 

Proof.  Since  the  LMS  estimator  is  scale  equivariant,  it  suffices  to  show  that  for  some 
constant  C, 

max|x,-rj  <  C|.y|(,H»/2|+*) 

The  order  statistic  |x,6|(fc)  is  continuous  in  b  so  \xib\{k)>c>0  for  all  ||6||»1.  Or, 
\Xib\{k)>c  ||*  ||  for  all  ^ . 
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Now,  if  there  are  only  [n  /2]-k  possible  outliers  and  all  the  other  n-[n/2]+k  observa- 
tions (j''s)  are  bounded  by  1,  we  must  show  that  the  estimator  Tn  is  uniformly  bounded  by, 
say  K.  This  is  true  since  for  at  least  n-[n/2]+k  residuals,  \yt- x{Tn\  >  c||7*n||-l,  which 
implies  that  median  |j/,— JC,-7iJ  >  1  >    median  |v,|  if  ||rj|  >  2/c.    Therefore  the  bound  on 

||rn||canbeK=2/c.         ■ 

If  the  observations  are  in  general  position,  then  k  =  p,  and  combining  Lemmas  3.3  and 
3.4  we  have  B  >  m*-\.  In  fact,  a  more  careful  analysis  enables  one  to  strengthen  Lemma  3.4 
and  thus  to  prove 

THEOREM  4.4.  For  the  LMS  estimator  with  observations  in  general  position 
B>m'=[n  /2]-p  +2. 

Proof.  (Sketch)  Suppose  initially,  |  v,  |  <  1  for  all  i  =  l,2,...n,  then  for  some  constant  C, 
||7"J|  <  C.  Set  M  =    sup  max | yt -x,  Tn  \ .  This  is  also  finite.  Let  V  denote  a  p-\ -dimensional 

subspace  in  Rp  and  let  S(t)  be  the  set  of  all  x{  whose  distance  to  V  is  no  larger  than  t.  The 
quantity  8  =  xh'\x\{{t>§\#S{t)  >  p).  Clearly,  6  depends  only  on  the  design  points.  If  the  obser- 
vations are  in  general  position,  it  can  be  shown  that  6  >  0. 

Finally,  essentially  the  same  geometric  argument  given  in  Rousseeuw  and  Leroy(1987, 
pp.  118-119),  implies  that  ||rj|  <  3C+2M/6  for  Tn  obtained  by  using  [n/2]+p-l  "good"  obser- 
vations, i.e.  \y{\  <  1.         ■ 

Remark.  When  the  observations  are  not  in  general  position,  it  can  be  shown  that 
m  '  <  [n /2]-k  +2.  In  that  case  we  still  have  that  B  >  m'-\. 

The  lower  bound  for  the  tail  performance  of  the  LMS  estimator  may  be  extended  to  the 
general  S-estimators  of  Rousseeuw  and  Yohai  (1984),  which  minimize  an  estimate  of  scale 
s(ru  *  *  *  >rn)  derived  from 

«"1E/'(''iA)=Po 

where  r,  =  y,-x,^.  Using  the  fact  that 
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a0median  |  r,  |  <  5(rl5  •  •  •  ,rn)  <  comedian  |  r,  | 

for  some  positive  constants  a0  and  al5  and  from  the  proof  of  Theorem  4.4  it  can  be  shown  that 
B  >  m*-\  for  S-estimators  when  the  observations  are  in  general  position  (see  He  (1988)  for 
details). 

An  upper  bound  on  the  tail  performance  of  the  LMS  estimator  for  error  distributions  of 
Type  II  is  given  in  the  following  result. 

THEOREM  4.6  For  F  of  Type  II  the  LMS  estimator  has  B  <  [n  /2]+ 1. 

Proof.  Suppose  n  is  odd,  and  set  m=(n  +  \)/2.  We  begin  by  showing  that  for  sufficiently 
large  A ,  0  <  A  <  a , 

P(\y  \(m)>a,\y  \(m-i)>A)  <  P(\y  \{m)>a,\y  \{m^)<A).  (4.4) 

In  fact,  for  some  combinatoric  constant  cn, 

P(\y\im)>a,\y\(m_1)>A)<cn(2-2F(a)r(2-2F{A)) 

and 

P(\y\{m)>a,\y\(m-i)<A)  =  cn(2-2F(a)r(2F(A)-ir-\ 

So  for  fixed  n,  and  A  sufficiently  large  (4.4)  holds.  Next,  we  will  suppose  ||x,||  <  1  for  all 
/  =  l,2,...,n  and  show  that  if  \y\[m)>a  and  \y  \(m-i)  <  A  then  ||Tn||  >lk\y  \(m)-A.  Note  that 
for  bounded  ||6||,  median |>',-x,6  |  >  \y  |(m)-||A||.  Now,  choose  b0  such  that  for  i  correspond- 
ing to  |  v  |(m),  rt-  =  lk  \y  \(m)  and  \\b0\\  =  %  \y  |(m).  When  ,xtl  =  1,  this  can  be  accomplished  by 
setting  b0  =  (%  \y  |(m),0,  •  •  •  ,0).  For  this  b0,  and  i  such  that  |  v,- 1  <  \y  l^^, 

\yi-xtb\  <  |y|(m-ir||x,||M. 

<A  +1/2|.v|(m). 
Hence,  including  |v|(m),  |y,-x,60|  <  lk\y\(m)+A  form  subscripts  and 

median | >>,-*,•  60 1  <  1k\y\(m)+A. 
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Thus,  if  V*\y  \(m)+A  <  \y  |(m)-||6||  or,  \\b\\  <  %|y  \(m)-A,  the  median  residual  cannot  be  minim- 
ized at  Tn;  and  consequently,  \\Tn\\  >  %  \y  \(m)-A.  Finally,  from  (4.4), 

P(\v  \(m)>a)  =  P(\y  \(m)>a ,\y  \{m-i)>A)  +  P(\y  \(m)>a ,\y  l^^A) 

<  2P(\y  \{m)>a,\y  \{m-i)<A)  <  2PQTJ  >  a/2-A). 

and  it  follows  that  B  <  (n  +l)/2  from  the  Type  II  behavior  of  F. 

For  n  even,  we  take  the  median  of  the  n  ordered  residuals  {rf,  •  •  •  /-n2}  to  be 
%0"i?/a  +  rl+n/i).  Let  Zi=|v  \{n/2),  z2=\y  \lMn/2)  and  z3  =  max|y(l)|.   And  argument  similar  to 

that  for  (4.4)  shows  that 

P(z1>a)<  2P(z1>a,z3<A) 

for  sufficiently  large  A.  It  remains  to  show  that  when  zx>a  and  z3<A  (as  a  becomes  large 
enough)  the  LMS  estimator  satisfies  flrjl  >  a /4.  For  \\b\\>a/4,  we  have 
median{r,2}  >  %(z2  +  z| )  +  ||^|p-||^||(z1+z2).  On  the  other  hand,  if  we  choose 
60  =  (%z1,0,  •••,()),  then  median{r,-2}  <  %(y4z2  +  (zZ-KzJ2)  or  %(z2  +z22)-(1/4z2+1/2z1z2) 
One  can  then  show  that  if  ||6||>a/4,  then  (V4zf +1^z1z2)  >  ||6||(z1z2)  which  implies  that 
median{r,2}  cannot  be  minimized  at  Tn.  Since  P(\y  \{n/2)>a)  <  2P(\\Tn\\>a /4)  implies 
B  <  1+/7/2,  the  proof  is  complete.  ■ 

Remark.  The  theorem  also  holds  if  Tn  minimizes  the  median  absolute  residual  instead  of 
the  median  squared  residual.  A  further  extension  can  also  be  made  to  Tn  which  minimize  the 
k-th  largest  absolute  residual  |r|(R_*).  In  that  case,  B  <  k+\.  A  special  case  is  considered 
above  in  the  Corollary  to  Lemma  2.1  where  k  =  [(n-p  +  \)/2].  Combining  the  lower  bound 
there  with  the  upper  bound  here,  we  have  in  this  case  [(n-p +  l)/2]  <  B  <  B  <  [(n-p +  l)/2]+l 

5.  Conclusion 

The  results  of  the  prececding  section  emphasize  the  close  link,  suggested  in  the  introduc- 
tion, between  the  tail  performance  of  estimators    and  their  finite-sample  breakdown  points. 
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Theorem  1.1  clarifies  this  relationship  for  a  broad  class  of  location  estimators.  In  the  regres- 
sion context,  the  breakdown  point  of  /x-type  estimators  is  seen  to  be  essentially  the  same  as 
the  lower  bound  on  their  tail  performance  under  heavy  tailed  error  conditions.  For  the  least 
median  of  squares  estimator,  and  indeed  for  the  broad  class  of  S-estimators  as  well  as  for  the 
least  squares  estimator,  we  find  that  tail  performance  is  bounded  below  by  the  breakdown 
point  under  Type  II  error  conditions.  Others,  Donoho  and  Huber  (1983),  Hampel,  et  al. 
(1986),  and  Rousseeuw  and  Leroy  (1987),  have  made  a  persuasive  case  that  the  breakdown 
point  of  estimators  is  an  important  "figure  of  merit"  in  the  assessment  of  quantitative  robust- 
ness. The  conjunction  of  breakdown  and  tail  performance  affords,  in  our  view,  a  rewarding 
new  window  on  the  robustness  scene. 
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